Classification of categorical and numerical data on selected subset of features

نویسنده

  • Zaher Al Aghbari
چکیده

Many Data Mining techniques use the whole features space in the classification process. This feature space might contain irrelevant, or redundant, features that could reduce the accuracy of classification. This paper presents an approach to select a subset of features that are most relevant to the classification application. We use a wrapper approach to search for relevant subset of features, which will be used in the classification of two datasets: categorical teachers’ dataset and numerical image dataset. Naïve Bayesian algorithm and KNearest Neighbor algorithm are used to classify and estimate the accuracy of the categorical data and numerical data, respectively. The experimental results for both categorical and numerical datasets indicate that classification accuracy is improved by removing the irrelevant features and using only the relevant subset of the feature space.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Framework for Distributed Multivariate Feature Selection

Feature selection is considered as an important issue in classification domain. Selecting a good feature through maximum relevance criterion to class label and minimum redundancy among features affect improving the classification accuracy. However, most current feature selection algorithms just work with the centralized methods. In this paper, we suggest a distributed version of the mRMR featu...

متن کامل

Online Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features

Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with t...

متن کامل

Feature Selection Using Multi Objective Genetic Algorithm with Support Vector Machine

Different approaches have been proposed for feature selection to obtain suitable features subset among all features. These methods search feature space for feature subsets which satisfies some criteria or optimizes several objective functions. The objective functions are divided into two main groups: filter and wrapper methods.  In filter methods, features subsets are selected due to some measu...

متن کامل

Classification of polarimetric radar images based on SVM and BGSA

Classification of land cover is one of the most important applications of radar polarimetry images. The purpose of image classification is to classify image pixels into different classes based on vector properties of the extractor. Radar imaging systems provide useful information about ground cover by using a wide range of electromagnetic waves to image the Earthchr('39')s surface. The purpose ...

متن کامل

ارائه یک الگوریتم خوشه بندی برای داده های دسته ای با ترکیب معیارها

Clustering is one of the main techniques in data mining. Clustering is a process that classifies data set into groups. In clustering, the data in a cluster are the closest to each other and the data in two different clusters have the most difference. Clustering algorithms are divided into two categories according to the type of data: Clustering algorithms for numerical data and clustering algor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012